Perform wordcount Map-Reduce Job in Single Node Apache Hadoop cluster and compress data using Lempel-Ziv-Oberhumer (LZO) algorithm

نویسندگان

  • Nandan Mirajkar
  • Sandeep Bhujbal
  • Aaradhana Deshmukh
چکیده

Applications like Yahoo, Facebook, Twitter have huge data which has to be stored and retrieved as per client access. This huge data storage requires huge database leading to increase in physical storage and becomes complex for analysis required in business growth. This storage capacity can be reduced and distributed processing of huge data can be done using Apache Hadoop which uses Map-reduce algorithm and combines the repeating data so that entire data is stored in reduced format. The paper describes performing a wordcount Map-Reduce Job in Single Node Apache Hadoop cluster and compress data using Lempel-Ziv-Oberhumer (LZO) algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lossless Message Compression Bachelor Thesis in Computer Science

In this thesis we investigated whether using compression when sending inter-process communication (IPC) messages can be beneficial or not. A literature study on lossless compression resulted in a compilation of algorithms and techniques. Using this compilation, the algorithms LZO, LZFX, LZW, LZMA, bzip2 and LZ4 were selected to be integrated into LINX as an extra layer to support lossless messa...

متن کامل

A decompression pipeline for accelerating out-of-core volume rendering of time-varying data

This paper presents a decompression pipeline capable of accelerating out-of-core volume rendering of time-varying scalar data. Our pipeline is based on a twostage compression method that cooperatively uses the CPU and GPU (graphics processing unit) to transfer compressed data entirely from the storage device to the video memory. This method combines two different compression algorithms, namely ...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

The Key as Dictionary Compression Method of Inverted Index Table under the Hbase Database

Starting with Hbase's own characteristics, this paper designs an inverted index table which includes key word, document ID and position list, and the table can saves a lot of storage space. After then, on the basis of the table, the paper provides key as dictionary compression with high compression ratio and high decompression rate for the data block. At last, this paper tests the effectiveness...

متن کامل

Research on Job Scheduling Algorithm in Hadoop

On the basis of researching Fair Scheduling Strategy deeply in Hadoop cluster,the Node Health Degree is defined by constructing the relationship function between node load and job fail rate, and a job scheduling algorithm based on Node Health Degree is proposed in this paper. Nodes are grouped, according to Node Health Degree, into three categories in order to assign corresponding job in accord...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1307.1517  شماره 

صفحات  -

تاریخ انتشار 2012